Demystifying Data Science

The Role Data Science Can Play in Healthcare

Paul Johnson

Introducing… Myself?


🧪 Principal Data Scientist - Population Health Analytics

🎓 Previously: Political Scientist - University of Houston, Texas

📈 Specialisms: Research Design, Statistical Inference, Machine Learning

What is Data Science?

Beyond the Buzzwords

A New Field Built on Old Traditions

Figure 1: A monk doing science

Figure 2: Another monk doing science!

The Intersection of Statistics & Engineering


Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.


      — Josh Wills (@josh_wills) May 3, 2012

Jacks of All (Data) Trades

Some common themes:

  • It involves drawing complex insights from large amounts of data
  • It combines elements of statistics, data analytics, software engineering, data engineering, and research science
  • The potential skill set is so wide that this is usually served by multiple people in a data science team

Drawing Complex Insights

AI ≠ Machine Learning ≠ Data Science

  • Data science leverages approaches to AI, including ML and DL, to find solutions to business problems
  • But there are often simpler solutions
  • And there are lots of problems for which ML and DL are not appropriate

It’s All Just Statistics Anyway (It’s Not)…

The Data Science Skillset

What Does a Data Scientist Look Like?

Dall-E Mini’s ‘Data Scientist’

When given the prompt “data scientist”, this is what the GAN Dall-E Mini returns…

Foundational Skills

  • Math - Linear algebra, calculus, probability & statistics
  • Computer Science - Programming, software engineering, data engineering
  • Research - Scientific reasoning & communication, quantitative methods, research design

    … It is more than just coding!

Soft Skills are Equally as Important!

There is more to data science than the technical skills.

For example:

  • Curiosity
  • Creativity
  • Communication Skills
  • Storytelling
  • Critical Thinking

Most of all, a passion for finding solutions to complex problems!

(Some) Domains of Data Science

  • Machine Learning
  • Deep Learning
  • Statistics
  • Data Engineering
  • Software Engineering

Data Science in the NHS

Why do we need it?

Technological Advancements

  • The exponential growth in the availability (and complexity) of data demands a more complex approach than descriptive analytics
    • Statistical models that replicate the complex data generating process
  • The exponential growth in computing power has made these complex statistical models accessible beyond the tech industry and computer science research

Keeping Up with the Joneses

  • Every industry is experiencing these technological advancements
    • Some industries are utilising it slower than others
  • The NHS has a lot of data, and using it effectively serves significant public good
  • Every CSU will realise this (if they haven’t already)

Potential Use Cases

Where Can (and Has) Data Science Been Applied in Healthcare?

Predicting Hospital Readmissions

Hospital readmissions are a significant cost to the NHS, and can be a sign of poor quality of care.

  • Several ways to approach this:
    • Classification
    • Survival analysis
    • Probabilistic forecasting

Could help to identify patients at risk of readmission and improve the quality of care.

Diagnosing Diabetic Retinopathy

Grading retinal images is a high-level pattern recognition task. Perfect for deep learning!

  • Convolutional neural networks are the most common deep learning architecture for image classification (though transformers are becoming more popular)
  • Some common DL frameworks include PyTorch, TensorFlow, Keras, Lightning

Could significantly reduce the grading workload, giving graders more time to focus on more complex cases.

Automating Dashboards & Reports

There’s a lot of data in the NHS, and a lot of reports to be generated from it.

  • Automating the generation of dahsboards andreports can save a lot of time
  • Data pipelines and workflow orchestration tools can be used to automate the process
  • Could be used to generate reports on a regular basis

Could save a lot of time for analysts, giving them time to focus on more complex tasks.

Modelling Excess Deaths From COVID-19

It’s difficult to estimate excess deaths due to multiple factors.

This could help to understand the impact of COVID-19 on the NHS, and could be used to inform future planning and resource allocation.

Thank You!

For more information and to get involved:


For Presentation Code & SCW Data Science Resources: